Large dataset partitioning using ensemble partition-based clustering with majority voting technique

نویسندگان

چکیده

<span lang="EN-US">Large datasets have become useful in data mining for processing, storing, and handling vast amounts of data. However, processing large is time-consuming memory intensive. As a result, the researchers adopted partitioning strategy to improve controllability performance reduce time required handle datasets. Unfortunately, numerous clustering techniques available literature could confuse experts choosing best given dataset. Furthermore, no technique can tackle all problems, such as cluster structure, noise, or density. To manage datasets, existing need scalable solutions. Therefore, this paper proposes an ensemble partition-based with majority voting dataset using aggregation k-means, k-medoids, fuzzy c-means, expectation-maximization (EM) density-based spatial applications noise (DBSCAN) techniques. These individually first stage. The final clusters are discovered next stage through among five algorithms. algorithms assigned instances most votes. experimental findings demonstrate that method surpasses other terms execution accuracy.</span>

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A partition-based algorithm for clustering large-scale software systems

Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...

متن کامل

XML Document Partitioning using Ensemble Clustering

In this paper we propose a new technique for partitioning XML documents, in which conventional clustering techniques operating on flattened representations of individual aspects of the XML documents are combined to partition the available XML corpus. This offers the potential to divide the problem of catching content and structural regularities into simpler subproblems, in which only individual...

متن کامل

Partition Selection Approach for Hierarchical Clustering Based on Clustering Ensemble

Hierarchical clustering algorithms are widely used in many fields of investigation. They provide a hierarchy of partitions of the same dataset. However, in many practical problems, the selection of a representative level (partition) in the hierarchy is needed. The classical approach to do so is by using a cluster validity index to select the best partition according to the criterion imposed by ...

متن کامل

The ensemble clustering with maximize diversity using evolutionary optimization algorithms

Data clustering is one of the main steps in data mining, which is responsible for exploring hidden patterns in non-tagged data. Due to the complexity of the problem and the weakness of the basic clustering methods, most studies today are guided by clustering ensemble methods. Diversity in primary results is one of the most important factors that can affect the quality of the final results. Also...

متن کامل

A Review of Ensemble Technique for Improving Majority Voting for Classifier

Data classification plays important role in the field of data mining. The increasing rate of data diversity and size decrease the performance and efficiency of classifier. The decreasing performance of classifier compromised with unvoted data of classifier. Now the merging of two or more classifier for better prediction and voting of data are used, such techniques are called Ensemble classifier...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Indonesian Journal of Electrical Engineering and Computer Science

سال: 2023

ISSN: ['2502-4752', '2502-4760']

DOI: https://doi.org/10.11591/ijeecs.v29.i2.pp838-844